Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add filters

Database
Language
Document Type
Year range
1.
Knowl Inf Syst ; 65(5): 2159-2186, 2023.
Article in English | MEDLINE | ID: covidwho-2174402

ABSTRACT

Domain-specific document collections, such as data sets about the COVID-19 pandemic, politics, and sports, have become more common as platforms grow and develop better ways to connect people whose interests align. These data sets come from many different sources, ranging from traditional sources like open-ended surveys and newspaper articles to one of the dozens of online social media platforms. Most topic models are equipped to generate topics from one or more of these data sources, but models rarely work well across all types of documents. The main problem that many models face is the varying noise levels inherent in different types of documents. We propose topic-noise models, a new type of topic model that jointly models topic and noise distributions to produce a more accurate, flexible representation of documents regardless of their origin and varying qualities. Our topic-noise model, Topic Noise Discriminator (TND) approximates topic and noise distributions side-by-side with the help of word embedding spaces. While topic-noise models are important for the types of short, noisy documents that often originate on social media platforms, TND can also be used with more traditional data sources like newspapers. TND itself generates a noise distribution that when ensembled with other generative topic models can produce more coherent and diverse topic sets. We show the effectiveness of this approach using Latent Dirichlet Allocation (LDA), and demonstrate the ability of TND to improve the quality of LDA topics in noisy document collections. Finally, researchers are beginning to generate topics using multiple sources and finding that they need a way to identify a core set based on text from different sources. We propose using cross-source topic blending (CSTB), an approach that maps topics sets to an s-partite graph and identifies core topics that blend topics from across s sources by identifying subgraphs with certain linkage properties. We demonstrate the effectiveness of topic-noise models and CSTB empirically on large real-world data sets from multiple domains and data sources.

2.
J Comput Soc Sci ; 3(2): 343-366, 2020.
Article in English | MEDLINE | ID: covidwho-953262

ABSTRACT

This article investigates the prevalence of high and low quality URLs shared on Twitter when users discuss COVID-19. We distinguish between high quality health sources, traditional news sources, and low quality misinformation sources. We find that misinformation, in terms of tweets containing URLs from low quality misinformation websites, is shared at a higher rate than tweets containing URLs on high quality health information websites. However, both are a relatively small proportion of the overall conversation. In contrast, news sources are shared at a much higher rate. These findings lead us to analyze the network created by the URLs referenced on the webpages shared by Twitter users. When looking at the combined network formed by all three of the source types, we find that the high quality health information network, the low quality misinformation network, and the news information network are all well connected with a clear community structure. While high and low quality sites do have connections to each other, the connections to and from news sources are more common, highlighting the central brokerage role news sources play in this information ecosystem. Our findings suggest that while low quality URLs are not extensively shared in the COVID-19 Twitter conversation, a well connected community of low quality COVID-19 related information has emerged on the web, and both health and news sources are connecting to this community.

3.
ArXiv ; 2020 Mar 31.
Article in English | MEDLINE | ID: covidwho-831508

ABSTRACT

Since December 2019, COVID-19 has been spreading rapidly across the world. Not surprisingly, conversation about COVID-19 is also increasing. This article is a first look at the amount of conversation taking place on social media, specifically Twitter, with respect to COVID-19, the themes of discussion, where the discussion is emerging from, myths shared about the virus, and how much of it is connected to other high and low quality information on the Internet through shared URL links. Our preliminary findings suggest that a meaningful spatio-temporal relationship exists between information flow and new cases of COVID-19, and while discussions about myths and links to poor quality information exist, their presence is less dominant than other crisis specific themes. This research is a first step toward understanding social media conversation about COVID-19.

SELECTION OF CITATIONS
SEARCH DETAIL